“ Missing is Useful ” : Missing Values in Cost - sensitive Decision Trees 1

نویسندگان

Shichao Zhang

Zhenxing Qin

Charles X. Ling

Shengli Sheng

چکیده

Many real-world datasets for machine learning and data mining contain missing values, and much previous research regards it as a problem, and attempts to impute missing values before training and testing. In this paper, we study this issue in cost-sensitive learning that considers both test costs and misclassification costs. If some attributes (tests) are too expensive in obtaining their values, it would be more cost-effective to miss out their values, similar to skipping expensive and risky tests (missing values) in patient diagnosis (classification). That is, “missing is useful” as missing values actually reduces the total cost of tests and misclassifications, and therefore, it is not meaningful to impute their values. We discuss and compare several strategies that utilize only known values and that “missing is useful” for cost reduction in cost-sensitive decision tree learning. 1 This work is partially supported by Australian large ARC grants (DP0343109 and DP0559536), a China NSFC major research Program (60496321), and a China NSFC grant (60463003). • Shichao Zhang is with the Department of Computer Science at Guangxi Normal University, Guilin, China; and with the Faculty of Information Technology at University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia; zhangsc@ it.uts.edu.au. • Zhenxing Qin is with the Faculty of Information Technology at University of Technology Sydney, PO Box 123, Broadway, Sydney, NSW 2007, Australia; zqin@ it.uts.edu.au. • Charles X. Ling, Shengli Sheng are with the Department of Computer Science at The University of Western Ontario, London, Ontario N6A 5B7, Canada; {cling, ssheng}@ csd.uwo.ca.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ordered Estimation of Missing Values

When attempting to discover by learning concepts embedded in data, it is not uncommon to nd that information is missing from the data. Such missing information can diminish the con dence on the concepts learned from the data. This paper describes a new approach to ll missing values in examples provided to a learning algorithm. A decision tree is constructed to determine the missing values of ea...

متن کامل

Cost Efficiency Measures In Data Envelopment Analysis With Nonhomogeneous DMUs

In the conventional data envelopment analysis (DEA), it is assumed that all decision making units (DMUs) using the same input and output measures, means that DMUs are homogeneous. In some settings, however, this usual assumption of DEA might be violated. A related problem is the problem of textit{missing} textit{data} where a DMU produces a certain output or consumes a certain input but the val...

متن کامل

Data Quality Improvement by Imputation of Missing Values

Having missing values in a data set is very common due to various reasons including human error, misunderstanding and equipment malfunctioning. Therefore, imputation of missing values is important to improve the quality of a data set. In our previous study we presented an imputation technique called DMI, which we then found better than an existing technique called EMI in terms of a few commonly...

متن کامل

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Classification performance can degrade if data contain missing attribute values. Many methods deal with missing information in a simple way, such as replacing missing values with the global or class-conditional mean/mode. We propose a new iterative algorithm to effectively estimate missing attribute values in both training data and test data. The attributes are selected one by one to be complet...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

“ Missing is Useful ” : Missing Values in Cost - sensitive Decision Trees 1

نویسندگان

چکیده

منابع مشابه

Ordered Estimation of Missing Values

Cost Efficiency Measures In Data Envelopment Analysis With Nonhomogeneous DMUs

Data Quality Improvement by Imputation of Missing Values

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

عنوان ژورنال:

اشتراک گذاری